Generative AI for computer vision

See:

VLMs

Deep CV

Resources

Models

Mochi (Genmo) - an open source state-of-the-art video generation model and is released
- https://huggingface.co/genmo/mochi-1-preview
- genmo.ai
Infinite AI Artboard - Recraft
Midjourney
DALLE (OpenAI)
- DALLE-2
- DALL·E 3
IMAGEN (Google)
IMAGEN video
Stable Diffusion
Make-A-Video
Leonardo.AI

Code

References

#PAPER Video Pixel Networks (Kalchbrenner 2016)
#PAPER Pixel RNNs - Pixel Recurrent Neural Networks (van den Oord 2016)
- Pixel-RNN presents a novel architecture with recurrent layers and residual connections that predicts pixels across the vertical and horizontal axes. The architecture models the joint distribution of pixels as a product of conditional distributions of horizontal and diagonal pixels. The model achieves state-of-the-art in the generation of natural images.
- https://medium.com/a-paper-a-day-will-have-you-screaming-hurray/day-4-pixel-recurrent-neural-networks-1b3201d8932d
- https://christineai.blog/pixelcnn-and-pixelrnn/
#PAPER Conditional Image Generation with PixelCNN Decoders (van den Oord 2016)
#PAPER PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications (Salimans 2017)
- #CODE https://github.com/openai/pixel-cnn
- https://openreview.net/forum?id=BJrFC6ceg&noteId=Bkc_sOZ4l
#PAPER FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models (Grathwohl 2018)
#PAPER Generating Realistic Geology Conditioned on Physical Measurements with Generative Adversarial Networks (Dupont 2018)
- Using G and D we want to generate realistic images conditioned on a set of known pixels
- Total loss is a combination of a Prior loss (high score of generated images from D) and a Context loss (generated image should match the known pxs)
- For the Context loss, a mask is used with smoothing
#PAPER Parametric generation of conditional geological realizations using generative neural networks (Chan 2019)
#PAPER Parametrization of Stochastic Inputs Using Generative Adversarial Networks With Application in Geology (Chan 2020)
#PAPER Generative Models as Distributions of Functions (Dupont 2021)
- Generative models are typically trained on grid-like data such as images (tied to the underlying grid resolution)
- Instead of discretized grids, they parametrized individual data points by continuous functions over which they learned distributions --> generative models
- Coordinate and feature pairs are treated as point clouds (sets with underlying notion of distance). Leveraged the PointConv framekwork
- Their model can learn rich distributions of functions independently of data type and resolution. Application to AI/Computer Vision/Super-resolution
#PAPER Score-Based Generative Modeling through Stochastic Differential Equations (Song 2021)
- #CODE https://paperswithcode.com/paper/score-based-generative-modeling-through-1
#PAPER Florence: A New Foundation Model for Computer Vision (Yuan 2021)
#PAPER Diverse Generation from a Single Video Made Possible (Haim 2021)
#PAPER Scaling Autoregressive Models for Content-Rich Text-to-Image Generation (Yu 2022)
- #CODE https://github.com/google-research/parti
- https://parti.research.google/
- Pathways Autoregressive Text-to-Image model (Parti), an autoregressive text-to-image generation model that achieves high-fidelity photorealistic image generation and supports content-rich synthesis involving complex compositions and world knowledge
#PAPER Autoregressive Image Generation using Residual Quantization (Lee 2022)
- #CODE https://github.com/kakaobrain/rq-vae-transformer
#PAPER MultiMAE: Multi-modal Multi-task Masked Autoencoders (Bachman 2022)
- https://medium.com/syncedreview/epfls-multi-modal-multi-task-masked-autoencoder-a-simple-flexible-and-effective-vit-pretraining-3185c4aa62fc
#PAPER InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions (Wang 2022)
#PAPER GenAI Arena: An Open Evaluation Platform for Generative Models (2024)
- https://huggingface.co/spaces/TIGER-Lab/GenAI-Arena